Object compositing based on 2D images is a challenging problem since it typically involves multiple processing stages such as color harmonization, geometry correction and shadow generation to generate realistic results. Furthermore, annotating training data pairs for compositing requires substantial manual effort from professionals, and is hardly scalable. Thus, with the recent advances in generative models, in this work, we propose a self-supervised framework for object compositing by leveraging the power of conditional diffusion models. Our framework can hollistically address the object compositing task in a unified model, transforming the viewpoint, geometry, color and shadow of the generated object while requiring no manual labeling. To preserve the input object's characteristics, we introduce a content adaptor that helps to maintain categorical semantics and object appearance. A data augmentation method is further adopted to improve the fidelity of the generator. Our method outperforms relevant baselines in both realism and faithfulness of the synthesized result images in a user study on various real-world images.
translated by 谷歌翻译
我们开发了一种贝叶斯方法,以预测从具有多通道(即多维张量)结构的多个来源收集的数据的连续或二元结果。作为一个激励示例,我们将来自多个'Omics源的分子数据考虑在多个发育时间点上测量,作为恒河猴模型中早期铁缺乏症(ID)的预测指标。我们在系数上使用具有低级别结构的线性模型来捕获多路依赖性,并在每个源分别对系数的方差进行建模以推断其相对贡献。共轭先验促进了有效的吉布斯采样算法以进行后推理,假设有正常误差的连续结果或具有概率链接的二元结果。模拟表明,我们的模型在错误分类速率和估计系数与真实系数的相关性方面的性能如预期的,在考虑到不同来源的不同信号大小时,通过合并多路结构和适度的增长,可以通过稳定的性能增长。此外,它为我们的激励应用提供了可靠的ID猴子分类。以R代码形式的软件可在https://github.com/biostatskim/bayesmsmw上获得。
translated by 谷歌翻译
最近的研究通过将基于Trimap的图像垫子的成功扩展到视频域,在视频垫子上取得了长足进展。在本文中,我们将此任务推向了更实用的设置,并提出了仅使用一个用户宣传的Trimap来强制执行视频底表的单个TRIMAP视频效果网络(OTVM)。 OTVM的一个关键是Trimap传播和α预测的关节建模。从基线构架传播和α预测网络开始,我们的OTVM将两个网络与alpha-Trimap修补模块结合在一起,以促进信息流。我们还提出了一种端到端培训策略,以充分利用联合模型。与先前的解耦方法相比,我们的联合建模极大地提高了三张式传播的时间稳定性。我们在两个最新的视频底变基准测试中评估了我们的模型,深度视频垫子和视频图108,以及优于大量利润率的最先进(MSE改善分别为56.4%和56.7%)。源代码和模型可在线获得:https://github.com/hongje/otvm。
translated by 谷歌翻译
堆叠提高了架子上的存储效率,但是缺乏可见性和可访问性使机器人难以揭示和提取目标对象的机械搜索问题。在本文中,我们将横向访问机械搜索问题扩展到带有堆叠项目的架子,并引入了两种新颖的政策 - 堆叠场景(DARSS)和Monte Carlo Tree搜索堆叠场景(MCTSSS)的分配区域减少 - 使用Destacking和恢复行动。 MCTSS通过在每个潜在行动后考虑未来的状态来改善先前的LookAhead政策。在1200次模拟和18个物理试验中进行的实验,配备了刀片和吸力杯,这表明命令和重新攻击动作可以揭示目标对象的模拟成功率为82---100%,而在物理实验中获得了66----100%对于搜索密集包装的架子至关重要。在仿真实验中,这两种策略的表现都优于基线,并获得相似的成功率,但与具有完整状态信息的Oracle政策相比采取了更多步骤。在模拟和物理实验中,DARS在中位数步骤中的表现优于MCTSS,以揭示目标,但是MCTSS在物理实验中的成功率更高,表明对感知噪声的稳健性。请参阅https://sites.google.com/berkeley.edu/stax-ray,以获取补充材料。
translated by 谷歌翻译
溶剂基碳捕获系统(CCSS)中的CO2捕获效率尺寸依赖性取决于气体溶剂界面(IA),使IA在CCS设计中的基础攻击最大化。虽然可以通过计算流体动力学(CFD)仿真估计与特定CCS设计的IA,但是使用CFD导出与许多CCS设计相关的IAS,这是昂贵的。幸运的是,以前的工作(如深液)(DF)(Kim等人,2019)表明,通过用神经网络(NN)代理商兑忠实地模仿CFD仿真过程的CFD模拟器来实现大型仿真加速度。这提高了对CFD模拟器的快速,准确更换的可能性,从而有效地逼近CCS设计优化所需的IAS。因此,在这里,我们建立在DF方法中,以开发成功应用于我们复杂的碳捕获CFD模拟的代理。我们优化的DF样式替代商会产生大型加速(4000X),同时获得位于训练配置范围内的未见CCS配置中的IA相对误差低至4%。这提示了NN代理人的CCS设计优化问题的承诺。尽管如此,DF对CCS设计具有固有的局限性(例如,培训模型的有限可转换性至新CCS填料)。我们与思想结束以解决这些挑战。
translated by 谷歌翻译
通过从大型天线移动到用于软件定义的无线系统的天线表面,可重新配置的智能表面(RISS)依赖于单元电池的阵列,以控制信号的散射和反射轮廓,减轻传播损耗和多路径衰减,从而改善覆盖范围和光谱效率。在本文中,在RIS存在下考虑了隐蔽的通信。虽然RIS升高了持续的传动,但是预期接收器和窃听者都可以单独尝试使用自己的深神经网络(DNN)分类器来检测该传输。 RIS交互向量是通过平衡将发送信号聚焦到接收器的两个(潜在冲突)目标而设计的,并将发送的信号远离窃听器。为了提高封面通信,对发射机的信号添加对抗扰动以欺骗窃听器的分类器,同时保持对接收器的影响。来自不同网络拓扑的结果表明,可以共同设计对抗扰动和RIS交互向量,以有效地提高接收器处的信号检测精度,同时降低窃听器的检测精度以实现封面通信。
translated by 谷歌翻译
本文提出了对基于深度学习的无线信号分类器的信道感知对抗攻击。有一个发射器,发送具有不同调制类型的信号。每个接收器使用深神经网络以将其超空气接收信号分类为调制类型。与此同时,对手将对手扰动(受到电力预算的影响)透射到欺骗接收器,以在作为透射信号叠加和对抗扰动的叠加接收的分类信号中进行错误。首先,当在设计对抗扰动时不考虑通道时,这些逃避攻击被证明会失败。然后,通过考虑来自每个接收器的对手的频道效应来提出现实攻击。在示出频道感知攻击是选择性的(即,它只影响扰动设计中的信道中考虑的接收器),通过制作常见的对抗扰动来呈现广播对抗攻击,以在不同接收器处同时欺骗分类器。通过占通道,发射机输入和分类器模型可用的不同信息,将调制分类器对过空中侵犯攻击的主要脆弱性。最后,引入了基于随机平滑的经过认证的防御,即增加了噪声训练数据,使调制分类器鲁棒到对抗扰动。
translated by 谷歌翻译
Remote sensing imagery provides comprehensive views of the Earth, where different sensors collect complementary data at different spatial scales. Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a $5.0\%$ non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a $0.9$ mIoU to $3.8$ mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales.
translated by 谷歌翻译
Traditional screening practices for anxiety and depression pose an impediment to monitoring and treating these conditions effectively. However, recent advances in NLP and speech modelling allow textual, acoustic, and hand-crafted language-based features to jointly form the basis of future mental health screening and condition detection. Speech is a rich and readily available source of insight into an individual's cognitive state and by leveraging different aspects of speech, we can develop new digital biomarkers for depression and anxiety. To this end, we propose a multi-modal system for the screening of depression and anxiety from self-administered speech tasks. The proposed model integrates deep-learned features from audio and text, as well as hand-crafted features that are informed by clinically-validated domain knowledge. We find that augmenting hand-crafted features with deep-learned features improves our overall classification F1 score comparing to a baseline of hand-crafted features alone from 0.58 to 0.63 for depression and from 0.54 to 0.57 for anxiety. The findings of our work suggest that speech-based biomarkers for depression and anxiety hold significant promise in the future of digital health.
translated by 谷歌翻译
This paper addresses the kinodynamic motion planning for non-holonomic robots in dynamic environments with both static and dynamic obstacles -- a challenging problem that lacks a universal solution yet. One of the promising approaches to solve it is decomposing the problem into the smaller sub problems and combining the local solutions into the global one. The crux of any planning method for non-holonomic robots is the generation of motion primitives that generates solutions to local planning sub-problems. In this work we introduce a novel learnable steering function (policy), which takes into account kinodynamic constraints of the robot and both static and dynamic obstacles. This policy is efficiently trained via the policy optimization. Empirically, we show that our steering function generalizes well to unseen problems. We then plug in the trained policy into the sampling-based and lattice-based planners, and evaluate the resultant POLAMP algorithm (Policy Optimization that Learns Adaptive Motion Primitives) in a range of challenging setups that involve a car-like robot operating in the obstacle-rich parking-lot environments. We show that POLAMP is able to plan collision-free kinodynamic trajectories with success rates higher than 92%, when 50 simultaneously moving obstacles populate the environment showing better performance than the state-of-the-art competitors.
translated by 谷歌翻译